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Abstract 

It has been suggested that the algebraic structure of AES (and other similar block 
ciphers) could lead to a weakness exploitable in new attacks. In this paper, we use 
the algebraic structure of AES-like ciphers to construct a novel cipher embedding 
where the ciphers may lose their non-linearity. We show some examples and we 
discuss the limitations of our approach. 
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Introduction 

The Advanced Encryption Standard (AES) [NatOl] is nowadays the most 
widespread block cipher in commercial applications. It represents the state- 
of-art in block cipher design and provides an unparalleled level of assurance 
against all known cryptanalytic techniques, except for its round-reduced ver- 
sions. It is true that AES (and other modern block ciphers) presents a highly 
algebraic structure, leading researchers to exploit it for new algebraic attacks, 
but these tries have been unsuccessful as yet (except for academic reduced 
versions) . 

The best that one can hope for a cryptosystem is that all its encryption 
functions behave in unpredictable way (close to random), in particular we 
would like that it behaves in a way totally different from linear or afiine maps. 

A sign of strength for AES is that nobody has been able to show that 
its encryption functions are any closer to linear maps than arbitrary random 
functions. 
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However, it might be possible to extend AES to act on bigger spaces, in 
such a way that the non-random behavior of AES becomes easier to spot. 
For example, it was hoped that embedding AES into BES would allow eas- 
le polynomial systems to break the ciphers (see [MR02], [TZ05]). Generally 
speaking, the worst scenario consists of a space large enough to make AES 
linear but small enough to allow practical computations. This is probably not 
possible. Our goal is to find a space small enough to allow practical computa- 
tions but large enough to identify a specific behavior of AES, showing that it 
is closer to linear maps than expected. 

In Section 1, after some basic algebraic background, we explain our point of 
view on block ciphers. In particular, we introduce the class of translation based 
cryptosystems, which are ciphers enjoying some interesting algebraic proper- 
ties. We also briefly describe the three main translation-based cryptosystems: 
AES, SERPENT and PRESENT. 

For completeness, in Section 2 we list the best-known attacks on round- 
reduced versions of AES. 

In Section 3 we provide formal techniques to construct a larger space on 
which the block cipher can act. We call these techniques space embeddings. 
In the case of translation-based ciphers, these embeddings are designed to 
lower the non-linearity of the encryption functions. We present one specific 
embedding and we obtain several results on the rank distributions for matrices 
in the larger space, which are useful to mount attacks. 

In Section 4 we present a larger embedding, that apparently works well 
with AES and other translation-based systems. The effectiveness of this em- 
bedding depends heavily on properties of the mixing-layer. 

In Section 5 we outline our approach to attack translation-based ciphers 
(including AES) with our embeddings. Although we have not been able to find 
an attack giving satisfactory statistical evidence, we have some partial data 
suggesting that our methods may work, as reported in [RSBIO] 

In Section 6 we discuss further on our non-linearity notion: 

• first, we report results from [Mai09],[MRS10] on embeddings where the 
decrease in non-linearity can be formally proved; 

• then, we propose alternative embeddings highlighting their flaws; 

• finally, with group theory proofs we also show that it is very unlikely that 
a representation/embedding can completely linearize any version of AES. 



easier than systems coming from random maps. 
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1 PreliminEiries 

In this section we recall well-known results in group theory and finite field 
theory [LN97] in order to fix the notation wc will use in the sequel. We also 
outline some basic ideas about block ciphers and wc recall the structure of 
three well-known cryptosystems: AES, SERPENT and PRESENT. 

1.1 Group representations 

Let n > 2 be an integer. Let V = (F2)" be the vector space over the 
finite field F2 of dimension n. We denote by Sym{V) and Alt(F), respectively, 
the symmetric and alternating group on V. For any A^, we denote by Sym^ 
and AltTv, respectively, the symmetric and alternating group on {1, . . . , A^}. 
Clearly Sym{V) is isomorphic to Sym2„ (the same for the alternating group). 
We denote by GL(V^) the group of all hnear permutations of V. We recall the 
well-known formulas: 

n—l 

|Sym(y)|=2"!, |Alt(y)| = ^ |GL(y)| = n(2" - 2'^) < 2"\ 

h=0 

Given a finite group G, we say that G can be linearized if there is an injective 
morphism p : G ^ GL{V) (this is called a "faithful representation" in repre- 
sentation theory). If G can be linearized, then, for any element g & G, we can 
compute a matrix Mg corresponding to the action of g over V (via p). The 
matrix computation is easy, since it is enough to evaluate 51 on a basis of V. 
li p : G ^ GL{V) is a representation of G on V, then we often write vg 
instead of vp{g), if no confusion arises. Also, G is said to act linearly on V, 
and V is called a G- module. The degree of the representation is by definition 
the dimension of V. If we consider Sym^, we can always linearize Sym^y over 
V via the so-called regular representation as follows. 

Let V" be a vector space with basis {ei, . . . ,6^}. The regular representation 
p : Sym^ — >■ GL{V) is defined by {ei)p{g) = Cig. In other words, any permuta- 
tion in SynijY is associated to a permutation (A^ x A^) matrix (and viccversa). 
Since any finite group G can be embedded in Sym^ for a smallest A^, we can 
always linearize G using the regular representation. But of course this is huge 
and usually impractical. 

1.2 Finite Fields 

For any prime p and any positive m e N, F^m is the field with p"* elements 
(unique up to field isomorphism). It contains an isomorphic copy of Fp and can 
thus be thought as an extension of Fp. On the other hand, we can construct 
any Fgs from F^ with q = p^ elements, as follows. 
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Let / G ^q[x] be an irreducible polynomial of degree m. We can consider 
the quotient R — ¥q[x]/{f), where (/) is the ideal generated by / in ¥q[x\. By 
considering the natural projection tt : ¥q[x] — >■ R, we call a — 7r{x) and clearly 
any element of R can be uniquely expressed as a polynomial in a of degree 
less than m: 

{m— 1 
i=0 

with the condition /(a) = 0. 
Theorem 1.1. R — ¥q[x]/{f) is a field and R = F^m. 

We denote by F* the multiplicative group of non-zero elements of ¥q. 
Theorem 1.2. For any finite field ¥q, the multiplicative group ¥* is cyclic. 

A generator of the cyclic group F* is called a primitive element of F^. 

Definition 1.3. An irreducible polynomial f e ¥q[x] is primitive if its roots 
are primitive elements. 

Note that for any q and m there are indeed irreducible polynomials of 
degree m over F^ and some of them are primitive. 

1.3 Permutation polynomials 

Definition 1.4. A polynomial f G ¥q[x] is a permutation polynomial of¥q if 
the associated polynomial function f : c ^ f{c) from ¥q into ¥q is a permu- 
tation of¥q. If f is an affine map f : x ^ ax + b {a ^ 0), we say that f is a 
linear polynomial. 

We note the following easy results: 

(1) Every linear polynomial over ¥q is a permutation polynomial of F^. 

(2) The monomial x"" is a permutation polynomial of ¥q if and only if 

gcd(n,g - 1) = 1. 

Permutation polynomials of F^ of degree less then q can be combined by the 
operation of composition and subsequent reduction modulo x'^ — x. The set of 
permutation polynomials of F^ of degree less then q forms a group, which is 
isomorphic to Sym(Fg). Then, the symmetric group Sym(Fg) and its subgroups 
can be represented as groups of permutation polynomials. 

Theorem 1.5. For q > 2, the symmetric group Sym(Fq) is generated by x'^~^ 
and all linear polynomials over ¥q. 



A. Rimoldi, M. Sala, I. Toll 



5 



1.4 Block ciphers 

Block ciphers form an important class of cryptosystems in symmetric key 
cryptography. These are algorithms that encrypt and decrypt blocks of data 
(with fixed lengthT^ according to a shared secret key. We can formally describe 
such a cryptosystem using the following definition: 

Definition 1.6. A cryptosystem is a pair (A^,/C), where: 

• A4 is a finite set of possible messages (plaintexts, ciphertexts); 

• IC, the key-space, is a finite set of possible keys; 

• we have encryption and decryption functions for any key k E IC: 

(j)k:M^M, ipk-M^M, (l)k,ipk^ Sym(A^) 

such that 

Following the most used structure in modern ciphers, in the previous def- 
inition we set that the plaintext space coincides with the ciphertext space. 
W.l.o.g, we can consider Ji4 = {^qY and K. = {¥qY, with r and i positive 
integers, and we change slightly our previous definition. 

Definition 1.7. Let r and i be natural numbers. Let cj) be any function 

<\> : (F,)^ X (F,)^ (F,)^ 

For any k G (F^)^, we denote by (pk the function 

0fc : (F,r ^ (F,r, = 0(x,A;). 

We say that (p is a algebraic block cipher if (pk is a permutation of (¥gY for 
any key k G (F^)^. 

Under this conditions, we can also consider a block cipher as an indexed set 
of permutations (¥qY -> Sym((Fg)''). Any key A; G /C induces a permutation 
(pk on Ai. Since Ai is usually V = (F2)'" for some r G N, we can consider 
(Pk G SymiV). 

To achieve the desired security, most modern block ciphers are iterated 
ciphers that typically incorporate sequences of permutation and substitution 
operations. In fact, according to the ideas that Shannon proposed in his sem- 
inal paper [Sha49], the encryption process takes as input a plaintext and a 



^ Actually, there is a recent approach that allows a slight change of the block length 
[CYK09] 
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random key and so proceeds through similar rounds. In each round (except 
possibly for a couple, which may be slightly different) the iterated ciphers 
perform a non-linear substitution operation (or S-box) on disjoint parts of 
the input that provides "confusion", followed by a permutation (usually a 
linear/affine transformation) on the whole data that provides "diffusion". A 
cryptosystem reaches "confusion" if the relationship between plaintext, cipher- 
text and key is very complicated. The "diffusion" idea consists of spreading 
the influence of all parts of the input (plaintext and key) to all parts of the 
ciphertext. The operations performed in a round form the round function. The 
round function at the p-th round (1 < p < iV) takes as inputs both the output 
of the (p — l)-th round and the subkey k^''^ (also called round-key). Any round 
key k'^p'> is constructed starting from a master kevT^ k of some specified length, 
e.g. k e IC = {¥2Y (nowadays we have 2^^ < |/C| < 2^56). The key schedule is 
a public algorithm (strictly dependent on the cipher) which constructs A^ + 1 
subkeys (A;(o),...,A;(^)). 

Several independent formal definitions have been proposed for iterated 
block ciphers (or subclasses of them). Stinson in [Sti95] gives the following 
definition of substitution permutation network (SPN for short). In [DR02] we 
can find another class of iterated block cipher, called the key- alternating block 
ciphers. 

Now, we consider a more recent definition [CDS09] that defines a class (see 
Definition 1.9), large enough to include some common ciphers, yet restricted 
enough to have simple criteria guaranteeing an interesting property of the 
cipher (for details see Subsection 6.3). 

Let V = (F2)'' with r = mb, b >2. The vector space is a direct sum 

V = Vi(B---®Vb, 

where each Vi, has the same dimension m (over F2). For any v & V, we will 
write V = f 1 © ■ ■ ■ © f where Vi G Vi. Also, we consider the projections 
7ii : V ^ Vi mapping v h-)- Vi. 

Any 7 G Sjm(y) that acts as vy = f i7i©- ■ ■©f676, for some 7^ G Sym(Vi), 
is a bricklayer transformation (a "parallel map") and any 7^ is a brick. The maps 
7j's are traditionally called S'-boxes and map 7 is called a "parallel S-box". 
A linear (or affine) map X : V ^ V is traditionally called a "Mixing Layer" 
when used in composition with parallel maps. We denote by a translation 
over V. 

Definition 1.8. A linear map A G GL(V^) is a proper mixing layer if no sum 

of some of the Vi (except {0} and V) is invariant under X. 

We can characterize the "translation based" class by the following 



^ also called session key. 
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Definition 1.9. We say that C is translation based (tb) if: 

• it is the composition of a finite number of rounds, such that any round 
can be writtei^^ as 'jXcr^., where 

■ •y is a round- dependent bricklayer transformation (but it does not 
depend on k), 

■ X is a round-dependent linear map (but it does not depend on k ), 

■ k is in V and depends on both k and the round (k is called a "round 
key"); 

• for at least one round we have (at the same time) that A is proper and 
that the map K, ^ V, k k, is surjective (a "proper" round). 

In [CDS09] the authors gave several non-trivial remarks that can be useful. 
Let us recall the principal ones. 

Remark 1.10. A generalization is obtained by allowing a key-independent per- 
mutation at the beginning and/or another at the end. This is the case for ex- 
ample for the SERPENT cipher. Since these permutations have no influence 
on the cryptanalysis of a cipher, they can be ignored. 

Remark 1.11. A round consisting of only a translation is still acceptable, by 
assuming 7 = A = ly (the identity map on V), although obviously it is 
not proper. Indeed, we can always assume that the first round is of this kind, 
otherwise we can remove its 7 and A (Remark 1.10). Then, we can also assume 
that O7 = 0, since we can add O7 to the round key of the previous round. 
If the previous round is proper, it remains proper since (T07 is a permutation 
over V. 

Remark 1.12. To allow affine mixing layers, rather than linear mixing layers, 
seems a generalization. However, this case is indeed already present in Defini- 
tion 1.9, since it is enough to change to incorporate the "translation part" 
of the mixing layer. 

Remark 1.13. A generalization can be obtained by only requiring at least 
one of the rounds to be of the prescribed form (with a proper mixing layer). 
Although the authors' results still hold in this more general case, we do not 
know any interesting cipher of this kind. 

Note that some famous ciphers, such as the DES, KASUMI and IDEA 
ciphers, cannot be seen easily as tb ciphers. Some of them (e.g. DES and 
KASUMI) are of Feistel type. They modify only one half of the cipher state in 
each round. It has been suggested that the Feistel ciphers suffer from a slow 
speed of diffusion compared to SPN (or key-iterated) ciphers. 

In the Subsections 1.5, 1.6, 1.7 we are going to describe respectively AES, 
SERPENT and PRESENT as translation based cryptosysteme^- 



^ we drop round indexes. 

^ The reader can find a full description of these cryptosystems respectively in 
[DR02], [ABK98] and [AKL+07] 
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1.5 The AES-128 cryptosystem 

Let M = IC = V = {¥2Y with r = 128 and let x G be our plaintext, 
k E K, our random key and y = the corresponding ciphertext. Before 

describing the individual components 7, A and of the round function, we 
recall (see Section 1.2) that it is possible to identify (F2)^ with the field F28, 
via the quotient map F28 ^ F2[x]/(m), where m G F2[x] is an irreducible 
polynomial such that deg(m) = 8. The irreducible (but not primitive) AES 
polynomial is m = + + + x + 1. 

Internally, the AES algorithm's operations are performed on a two-dimen- 
sional array of bytes, called the State. It consists of 4 rows and 4 columns and 
each element of this matrix is one byte (i.e. an element of F28 = F256). At the 
start of the encryption process, the input x (the plaintext) is a vector in V 
and it is first changed into a 16-byte vector: 

1/ : {¥2)''' ^ (F256)'^ X ^ y. 

Each round performs its operations on the State and after the last round the 
State is "unwrapped" and "fills up" the output vector. 

A preliminary translation (1^(0), where fc'-^^ G (F2)'" is the first round key, 
is applied to the plaintext to form the input to the (Round 1). It means that 
we can consider a preliminary round (Round 0) such that 7 = ly and X = ly 
(see Remark 1.11). 

In order to obtain the ciphertext, other = 10 rounds follow. 

Let l<p<A^ — l.A typical round (Round p) can be written as the compo- 

sitioE07A(T^(p), where 

• the parallel map 7 is called SubBytes and it works in parallel to each of 
the 16 bytes of the data; 

• the affine map A is the composition of two operations known as ShiftRows 
and MixColumns; 

• cr^(p) is the translation with the session key k^^^ (this operation is called 
AddRoundKey). 

The last round (Round A^) is atypical and is characterized by ■yXa,^(N) 
where the affine map A is only made by the ShiftRows operation. So we obtain 
our ciphertext y = 0a:(x)- 

In the following, we analyze the structure of each component of the round 
function. 



Note that the order of the operation is exactly: 7, A, and then 0"^. 
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1.5.1 SuhBytes 

The vector space V is the direct sum V = Vi © ■ ■ ■ © Viq where each 
Vi = (^2)^ 16). Any parallel map 7 G Sym(V^) acts on an element 

V E V a.s vy = fi7i © ... © fieTie, where Vi E Vi and 7^ G Sym(V^). The 
SubBytes operation 7 is composed by two transformations: the inversion in 
F28 and an affine transformation. 

The inversion operation is the patched inversio in F28 (i.e. ^{x) = x^^^). 
The affine transformation over F2 consists of an affine mapping ^ : (F2)^ — ?■ 
(F2)^, specified by an 8 x 8 circulant matrix over F2 and a translation. The 
result of inversion is regarded as a vector in (F2)^ and the output is given by 
y = ^{x), where 
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1.5.2 Mixing Layer 

The map \ -.V ^ V is a, composition of two linear operations: ShiftRows 
and MixColumns. The ShiftRows operation is performed as follows. Any byte 
(an element of F28) in row i of the State, where < i < 3, is cyclically shifted 
(towards left) by i positions, as follows: 



50 S4 Ss S12 So S4 S8 S12 

51 S5 Sg Si3 c.^ ^ ^5 Sg Sl3 

— )■ hhiftKows — )■ 

52 Sq Sio Si4 Sio Si4 S2 Sq 

53 S7 Sii Si5 Si5 S3 S7 Sii 



Since the AES consists of 10 rounds and each round requires 16 S'-box com- 
putations, the probability of there being no 0-inversions during an encryption is 
(255/256)160 « 0.53. 
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In other words, we can describe the ShiftRows operation by the map 



sh : (F28)^'' ^ (F28) 



vl6 



(sO, Si,-- - , S15) 1-^ (so, S5, Sio, Si5, S4, Sg, Su, S3, Ss, S13, S2, ^7, Sl2, Si, Sq, Sn). 

We can also represent the ShiftRows operation with the following 16 x 16 
block diagonal matrix 



7 

i? 

i?2 





\ 



( 



R 



10 

10 
1 
10 



where the matrix i? is a permutation matrix over F28 that represents the shift 
of one row by one position. 

In order to describe the MixColumns operation, each column of the State 
can be treated as a four-term polynomial in F256[-2]- Let c{z) be one such 
polynomial. Then each column is replaced by the result of the multiplication 
in F256[^]/(2;'^ + 1) by a{z), ct-^ c - a mod (2;^ + 1), 



[ci, 02,03,04) 



[Oi ■ a,02 ■ a, 03 ■ a, C3 ■ a) 



Note that a{z) is invertible in F256[-s]/(-s^ + 1)- On the other hand, we can see 
the MixColumns operation as a 4-block diagonal matrix, each block the same 
MDS matrix (i.e. all minors are non-zero): 



z z+1 1 1 

1 z z+1 1 
1 1 z z+1 

z+1 1 1 z 



\ 



Remark 1.14. This MDS property is used to ensure that the number of active 
S-boxes involved in a differential or linear attack increases rapidly, and the 
security of the AES against these particular attacks can be established. 

Obviously, we can also see the whole Mixing Layer (A linear operation) 
as a matrix M. Wc observe that the order of this matrix is quite small, i.e. 
= 1. (Also, both the order of ShiftRows and MixColumns are equal to 4.) 
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1.6 The SERPENT cryptosystem 

Let M = V ={¥2y, with r = 128. We consider /C = (F2)^ with the fixed 
length £ = 128, ahhough the key is designed with variable length. 
The encryption proceeds by iV = 32 similar rounds and it works as follows: 

• a preliminary permutation is applied tt : V ^ V (this is not used for 
security, rather to ease the implementation); 

• there is a preliminary translation with the first round key; 

• — 1 rounds with the same structure are applied, but using a difi^erent 
permutation, each composed of a key translation Ufc, a parallel S-box 7 
and a linear mixing-layer A (we denote the round p by Round p, with 
p=l,...,31); 

• the last round (Round 32) follows and it consists of the composition '~fXak 
where A = 1^; 

• a final permutation tt"^ : V ^ V is performed. 

The decryption process is easily obtained by inverting every step of the 
encryption, using the inverse of the (S-boxes, the inverse of the mixing-layer 
and the reverse order of the round keys. 

Let p be a natural number such that 1 < p < 31. In order to describe a 
typical round (Round p) we have to specify how the components 7, A and 
are applied. We note that, after the permutation tt : V ^ V, we perform a 
preliminary translation a^(o), where e (F2)^ is the first round key. 

Let \^ = V^i © ■ ■ ■ © V32, where , for any 1 < j < 32, each Vj = (Fs)^. Any 
7 e Sym(l^) acts as vy = V171 © . . . ©t'32732, where vj e Vj and 7^ e Sym{Vj). 

We have to characterize each 7^ (i.e. we have to construct each 5'-box). 
The eight 5-boxes -Si, . . . , -Sg of SERPENT were built "ad hoc" starting from 
the 8 fixed 5'-boxes of DES (see [ABK98]). To each vj we apply the same -S"- 
box Si mod 8, SO that Si mod sivj) hes in {¥2)^. That is, 71 = 72 = • • • = 732 = 

mod 8- 

Then the linear transformation A (described in [ABK98]) and a final trans- 
lation (j^(p) are applied. The last round (Round 32) is only slightly different. 
The only difference with a typical round is the replacing of the linear trans- 
formation A by ly. 
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1.7 PRESENT: an ultra-lightweight block cipher 

PRESENT is an iterated block cipher that consists of = 31 rounds. 
\,ct M = V = i¥2Y with r = 64. Let /C = (Fz)^ where f may be equal to 80 
or 128. We consider only the PRESENT'S version such that fC = (F2)**^, since 
its authors recommend it in order to have a good performance. 
We are going to describe how the round function ^X(Tj^(p) (in the p-th typical 
round) is performed. 

As in the AES and SERPENT cryptosystems, the encryption process starts 
with a preliminary round (Round 0) that consists of a parallel map 7 = ly, a 
hnear transformation A = ly and the translation 0-^(0) , where k^^^ e (F2)'' is 
the first round key. A typical round consists of the non-linear operation, called 
sBoxLayer, the linear transformation, known as pLayer and the sum with the 
round key. 

The parallel map 7 e Sym(y) used in PRESENT acts as 

1^7 = ^i7i ® ■ ■ ■ ® ^^16716, 

where each Vi G (F2)*^ and 7^ G Sym((F2)*^) (1 < i < 16). The action of any 
brick 7j : (F2)^ — >■ (F2)^ is given by the following table, using an hexadecimal 
notation: 



X 





1 


2 


3 


4 


5 


6 


7 


8 


9 


A 


B 


C 


D 


E 


F 




c 


5 


6 


B 


9 





A 


D 


3 


E 


F 


8 


4 


7 


1 


2 



The affine map A : F — )■ F is a bit permutation as given by the following 
table, where the bit i of the intermediate state is moved to the bit position 
P{i). 



i 

P{i) 


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
16 32 48 1 17 33 49 2 18 34 50 3 19 35 51 


i 

P{i) 


16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
4 20 36 52 5 21 37 53 6 22 38 54 7 23 39 55 


i 

P{i) 


32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 
8 24 40 56 9 25 41 57 10 26 42 58 11 27 43 59 


i 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 
P{i) 12 28 44 60 13 29 45 61 14 30 46 62 15 31 47 63 
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2 Known attacks 

AES's structure has been used to carry out some innovative analysis. Such 
attacks tend to have a similar form: 

• they identify a property holding for a few rounds with a good probability; 

• they use special techniques to extend the attack to more rounds. 

The following table summarizes the more successful attacks on round- 
reduced versions of the AES cryptosystem: 



Key 


Rounds 


Texts 


Time 


Type 


Reference 


128 


5 


211 


240 


Square attack 


[DR98] 


128 


5 


229.5 


231 


Impossible diff. 


[BKOO] 


128 


5 


239 


239 


Boomerang attack 


[Bir04] 


128 


6 


232 


272 


Square attack 


[DR98] 


128 


6 


234.6 


244 


Partial Sum 


[FKL+00] 


128 


6 


291.5 


2122 


Impossible diff. 


[CKK+01] 


128 


6 


2^1 


2^1 


Boomerang attack 


[Bir04] 


128 


7 


2128 _ 2119 


2120 


Partial Sum 


[FKL+00] 


128 


7 


232 


2128 


Collision 


[GMOO] 


192 


7 


291.2 


2139.2 


Impossible diff. 


[?] 


192 


8 


2127 


2188 


Partial Sum 


[FKL+00] 


192 


10 


2124 


2183 


(Related-key) Rectangle 


[BDK05] 


192 


12 


2123 


2176 


(Related-key) Ampl. Boomerang 


[BK09] 


256 


8 


232 


2194 


Partial Sum 


[FKL+00] 


256 


9 


285 


2126 


Partial Sum 


[FKL+00] 


256 


10 


2114 


2173 


(Rclatcd-kcy) Rectangle 


[BDK05] 


256 


14 


2119 


2119 


(Related-key) Boomerang 


[BK09] 



Other researchers attack small scale variants of the AES, where also the mes- 
sage space and the key space arc reduced (see e.g.[CW09]). A recent prac- 
tical attack (due to A.Biryukov, O.Dunkelman, N.Keller, D.Khovratovich, 
A.Shamir) on a (10-round version) of AES-256 has been presented ([BDK+10]). 
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3 First results 

In the literature there are some ways of representing the same cipher (e.g. 
AES), hke BES [MR02] or Dual Ciphers [BB02], that could be useful for the 
cryptanalysis. Other ways of representing AES that exploit its structure can 
be found, for example, in [CMR07]. 

In this section we represent "AES-like" ciphers by embedding them into larger 
ciphers. In Subsection 3.1 we begin with We want to enlarge Q to a set W 
such that: 

(1) 1^ is endowed with a vector space structure; 

(2) the permutations can be extended to act linearly on the whole W. 

In Subsection 3.2 we provide one specific embedding of AES-like ciphers that 
linearizes the non-linear part of these ciphers, but it fails to linearize the whole 
cipher. In particular our embedding can be apphed to AES, PRESENT and 
SERPENT. 

3.1 Some preliminary results 

Let Q be a set such that \ fl\ — n, let Sym(Q) be the symmetric group on 
Q and let W he a, vector space over a field F (not necessarily a finite field) . 

Definition 3.1. Let G < Sym(f2). An injective map (p : Q ^ W is a space 
embedding with respect to the group G if, Vcr G G, e GL(iy) such that 
(f) o a — Afj o (f). 

Moreover, (f){Q) is the set of all admissible vectors (w.r.t. 0), the subspace 
(0(r2)) is the admissible space. Note that since 0(f2) C {(p{^)) then (0(f2)) is 
the smallest subspace containing all admissible vectors. Generally speaking, 

\{m)\»\m\- 

Note that the regular representation (see Subsection 1.1) can be considered 
as a space embedding (p : ^ W with respect to the group G = Sym(n), 
where dim(iy) — \fl\ — n and (f) : u) ^ h^^ with {ft^lwen a basis of W . Also, 

W={ct>{^l)). 

A space embedding permits to construct a faithful representation of G, as 
explained in the next proposition. 

Proposition 3.2. Let o:Q^Wbea space embedding with respect to G. 
Suppose that Vcr e G JIA^ e GL(H^) s.t. (p o a = A^ o (p. Then 

(1) we can define a map : G — >■ GL(VF), where (j){a) — A^, for any a e G; 

(2) (j) is a group homomorphism. 
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Proof. 1. Obvious. 

2. We have to prove that ^{aa') — ^{a)^{a') for all a, a' e G, i.e. A^^i — 
Aa-Aa-'. Using Definition 3.1, the following equality holds 

Aaa'i4>ico)) = <i>iiaa'm) = 

Since 

A^A^>{(Kco)) = A^i<t>ia'iu))) = (P{a{a'{uj))), 
we conclude that A^^^r — A^A^i, for all a; e O. □ 

Remark 3.3. In Definition 3.1 we require only that A^r exists, however in 
Theorem 3.2 we see that it is also unique. 

For example, for the regular representation any permutation a e Sym(n) 
defines a permutation a G Sym({6tj}tjgn) and so it defines a unique Acr G 
GL(iy), which can be represented as a permutation matrix. 

Now, we are interested in a special case of space embedding where the set 
Q is a vector space V — (F2)'' and W is the vector space (F2)*, with s > r. 
For any 1 < i < s, let e W: 

e, = (0,...,0,l,0,...,0). 

t 

i 

Let a G Sym(y) be any permutation over (F2)''. We want to embed V into 
W by an injective map a and to extend cr to a permutation a' G Sym(iy) as 
shown in the following commutative diagram: 




In order to do this, we have to define the permutation a' G Sym(l^). We say 
that a' is an extension of a. We seek a a' that is linear on W . The following 
definition will be useful: 

Definition 3.4. Let a G Sym(y) and a he an injective map a : V ^ W. We 
say that a is linearly extendible (via a) ify{v'^}i^i CV we have 



Remark 3.5. Since we are considering the finite field F2, we note that a is 
linearly extendible (via a) if V{v*}jg/ C V such that J^iei^i'^^) = we have 
J2iei '^('^(^*)) = 0- ^^^^^ injective map defined on the set 

{Wh (^v\J2 «K) = 0}} 

iei 
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into the set 

{{a{v^)}iCV\J2a{a{v^)) = 0}} 

is a bijective map, since the cardinahty of the two finite sets is the same. 

Let a : V ^ W he a space embedding. Let A = Im(a) = a(y) and let 
T = {A) be the subspace (the admissible space) of W linearly generated by 
A. Since cr'(a;(f )) = a(cr(f)), Vf G V, we require that cr'{A) = A. 




In order to specify the behavior of a' on {T\A), which is the space of non- 
admissible vectors in the admissible space, we have to consider two different 
cases: 

(a) suppose that a is linearly extendible. Let t G T, we must have t = 
Si<j<t'^"'' with L > I, with {a-'}i<j<i C A, = a{v^) (with I < j < l 
and G V). Then we define 

a'{t) = J2 ^'(«^) = E «(^(^'))' 



(b) in case a is not linearly extendible, we define 0"^^ = idr\yi- 

We now define a' on W\T according to the two previous cases (i.e. depending 
on the behavior of a on A). 

In ), let r be the dimension of the subspace T. We consider any subset 

B of {ei, . . . , e^} such that \B\ = s-t and W is the direct sum W = T®{B). 
It is obvious that B exists. Let w G W, then w = wt + wb with wt G T and 
wb G {B). Finally, we define 

cr'(w) = aiwr) + Wb- 

In case (b) we define = idvy\T- 

Lemma 3.6. If a is linearly extendible, then a' G GL(W^). 

Proof. We first show that a' is well-defined on T. Let t = a* and t' = 
suppose that t = t'. Since a is linearly extendible, we have 
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I J I J lUJ 

a' it) + a' it') = J2 + E ^'(«') = E «(^(^^)) + E «(^(^')) 

/ J 7 J 

We now show that a' is hnear on T. Let ti — a^*^ . We have to show that 
<^(E,«i) = E,<T'W- Clearly, 

-'(EE 4") = ( E "i-') = E E -'("I") = E ( E 

i h i,h i h i h 

i 

and we have our thesis. 

Since a' is hnear on T and T is a finite set, in order to prove that a' is 
bijective on T it suffices to show that kercr' = 0. We have (by definition of 
hnearly extendible) 

= a'{t) = E"(t^(^^^')) ^ = E"(^^) = ^ 

Finally, we show the linearity on W. Let {w^}i^i C W, we have to show 
the following equality 

Since W is direct sum of T and {B), each element in W can be considered 
as + and so we can write the following 

a' ( E ^0 ^ ( 5^^^^ + ^^)) = ( E ^0 + 5Z < 

iei iei iei 

E ^'K) = E ^'(^T + = E ^'(^t) + E 

iei iei iei iei 

It easily follows that (1) holds if and only if 

<^'(E^t)^^^'^^^)- 

iei iei 

□ 

Remark 3.7. The construction of a' G GL(VF) from a linearly extendible (Def- 
inition 3.4) can be done similarly over any field. 
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We are now able to prove the main result of this subsection. 

Theorem 3.8. Let W = (F2)''" and G < Sym(\^). An injective map a : V ^ 
W is a space embedding with respect to G if and only if, Wa E G, a is linearly 
extendible. 

Proof. Let a be a space embedding with respect to G. For any fixed a E G, 
there exists a map A„ G GL(IV) such that aoa = A^joa. Now, let be 
a finite set such that w'' = a(f *) (for any i E I) and ^jgj if* = 0. Obviously 
we have 

J2 c^Kvi) = E Mc^iv^)) = E Mw') = ( E = °- 

ie/ iei iei iei 

The converse immediately follows thanks to the previous lemma. □ 

Remark 3.9. For a fixed a and a, the map a' is unique and a : G — > GL(iy) 

is a representation of G, by Proposition 3.2. 

Remark 3.10. In the following we use Acr and a' interchangeably. 

3.2 A first embedding 

We now apply the theory developed in the previous section to a specific 
space embeddingfj e : V ^ W. 

Let us identify (F2)™ with the field F2™., via the quotient map F2™ ■H- 
F2[x]/(p), where p G F2[x] is any primitive polynomial such that deg(p) = m. 
We define a map e' : ¥2^ — )■ (F2)^'" by means of a primitive element 7 of ¥2-^ 
(which is a root of p). The map e' is defined as 

£'(o) = (i,o^_^ £'(y) = (o,...,o, i,o,...,o) vi<2<2'"-i. 

2Tn j+1 

Note that e'{l) = e'{-f^"^-^) = (0, . . . , 0, 1). 

Let 6 be a positive integer, let r = mb and s = 2^6. Let V = (¥2)^ and 
W = {¥2)^. We construct our injective map e : V ^ W in the following way: 

e{vi, ...,Vb) = {e'ivi), . . . , e'{vb)) (2) 

for any Vj G (F2)™ ( 1 < J < Note that e is a parallel^] map. 

For simpUcity of notation, we set ci = e'{0) = (1, 0, • - ? ) ^^^1 Cj+i = £'(7*), 

2™-l 

for any 1 < i < 2™ - 1. 

^ which is called "a" in Subsection 3.1. 
^ see Subsection 1.4. 
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We note that 

Lemma 3.11. Suppose that X^ie/^j — ^h- Then h & I. 
Proof. It follows from w(ei) = 1, for all i e /. □ 

The following lemma is easily proved: 

Lemma 3.12. Let I be a finite index multiset such that {?,'*}/ C V . For any 
1 < h < b we have '^i^i s' {vU = if and only if Vi e /, \{j & I \ vj^ = vl}\ 
is even. 

Proof. Since e' maps each element of into the canonical basis of (F2)^"', 
each e'{vD is a vector such that w{e'{vD) = 1. Considering the following sum 
in F2, we have that = if and only if each component is made by 

an even number of 1, i.e. if and only if each element of the canonic basis that 
appears in our sum has an even weight. Since e' is bijective, we have that 
\{j & I \ vj^ = is even, Mi ^ I. □ 

Proposition 3.13. Let e as in (2). Then dim^^ ((Im(£))) = 2"*6 - (6 - 1). 
Proof. We define the elements 

Zi^j — (ei, . . . , ei, e^, ei, . . . , ei), 
t 

i 

for 1 < i < 6 and 1 < j < 2"^. Note that 7^ Zh/ for ^ {h,t), except 
for zii = Z21 = . . . = Zbi. We consider the set B — {zi^i} U {zij}j>2, i<i<b- For 
instance, when m = 2 and 6 = 2, we have 

B = {(ei, ei), (ei, 62), (ei, 63), (ei, 64), (62, ei), (eg, Ci), (64, Ci)}. 

Clearly, the cardinality of the set B is given by 

|{^ij}l<6, l<i<2'"| — |{^i,l}i>2| = 2"*6 — {b — 1). 

We claim that the set B is a basis for the subspacc {lm{e)). 
First, wc prove that i3 is a linearly independent set. Suppose Zij G B such 
that {i,j) (1, 1). By definition of B, the element Zij is the unique element of 
B having a vector Cj in position i. Thus, Zij cannot be the linear combination 
(i.e. a sum) of any other vectors of B (see Lemma 3.11). Now, we have to 
consider the element 2:1^1. Let Zi^i = 'Yli^i j)e.J ^i,r W.l.o.g., we can assume by 
Lemma 3.11 that there is (i, j) G J such that zj-j = (ei, . . .). Since zjj 7^ Zi^i 
we can assume w.l.o.g. zjj = (ei,ej,ei, . . . ,ei), i.e. i = 2. There is no other 
Zij having ej in the second position. Therefore, the sum zi^i should contain a 
1 in component m + j, which is impossible. 

Next, wc prove that B generates {lm{e)). To do that, it suffices to prove 
that every element of Im(£) belongs to the subspace generated by B. If we 
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consider an element w = (cj^, . . . , e^J e Ini(e), we have 



w 



H \- Zh,n if ^ is odd, 

^iji + • • • + 2^6j(, + Zi^i if ^' is even, 



since 



6-1 



(e^'i, ei,..., ei)+ 
( 61,6^2,..., ei) + 



ei, . . . , ci ,ej^) 



b odd 



1 < 



(ei, ei, . . . , ei)+ 
(e^'i, ei,..., ei)+ 
( 61,6^2, . . . , ei)+ 

(ei,..., ei,ejj = 



b even 



(^Jl' ^j2) ■ ■ ■ ) ^jb) 



i^jn ^j2i ■ ■ ■ 1 ^jb) 



□ 



Let .Abe a subset of the plaintext set Ai such that |^| = dimpa {{lra{e))) = 
T^b - {b - I). Let a' e A, I < i < \A\. We construct the (|^| x 2™6)-matrix 
H such that the i-ih. row is the image of the parallel map e applied to the 
plaintext a* e ^, for i e {1, • • • , |^|}: 



H 



\ 



e[a 



( 



I 



e a 



e a 



\£ {^1 ) £ 



s [a 



(3) 



We would like to determine the expected rank for such a matrix. Generally 
speaking, for a random {t x n)-matrix with entries in the finite field Fq, we 
can use the following well known results: 

Theorem 3.14 ([MMM04]). Let t, k,n G N \ {0}, where k <n andk<t. 
(1) The number of ordered k-tuples of linearly independent vectors in (F^)" 



IS 



k-l\ 



(2) The number of k- dimensional subspaces of (F^)" is given by the q-binomial 
coefficient 



.1(9^' - 
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(3) The number of {t x n)-matrices of rank k with entries in ¥g is given by 
the following formula 



"-.'=(1) n («'-«')• 



We note that 



(4) 



By using the previous theorem, the relation in (4) and observing that 

dt-2,t _ dt-2,t dt-i,t 
dt,t dt-i^t dt^t 

we immediately get the following corollary: 

Corollary 3.15. Let q — 2 and suppose t < n. We have the following rela- 
tions: 

di,t = (2"-l)(2"-2)...(2--2*-i); 
(2*-l) 1 
dt,t (2" - 2*-i) 2"-*-i - ' 
dt-2,t_ (2*-l)(2*-i-l) 



dt,t 3(2"-2*-2)(2"-2*-i) 

Corollary 3.16. Let q — 2 and suppose t — n. We have the following rela- 
tions: 

<„ = (2«-l)(2--2)---(2--2"-i); 

""n,n ^ 
4-2,n _ (2"-l)(2"-^-l) 
<n 9-22-3 

In other words, the probability that a (t x n) random matrix {t < n) with 
entries in F2 has rank exactly t is significantly greater than the probability 
of having rank equal to t — 1 or t — 2 or less. Instead, the probability that a 
square (n x n) random matrix has rank n — 1 is the greatest. 
Remark 3.17. In theory, the previous theorem cannot be applied to our case 
because our construction imposes specific constraints, for example on the row- 
weight. However, in practice our ratio approaches that of the Corollary 
3.16 for t = dimpj ((Im(£))). 

So, in order to point out the distribution of the ranks of our matrices we 
provide a bound on the number of the full-rank matrices. 
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Lemma 3.18. Let c — , let n — cb {n > k) and z — dimp^ ((Im(£:))). 
The total number of admissible vectors in (Im(£)) is c^. The average number 
^{h) of admissible vectors in a subspace generated by h linearly independent 
admissible vectors is 

e(0) = 0, = e(2) = 2, 

e(/i) = /i + (2'^-/i-l)(|^), 3<h<z-l 

Proof. An admissible vector can be any vector having weight 1 in any of the 
b components. There are such vectors. 

The whole space {lm{e)) contains 2^ vectors. The subspace B generated 
by h independent vectors {Vi, . . . ,Vh) contains 2^ vectors. Of these, h are 
(Vi, .... Vh) themselves (admissible) and one is the zero vector (non- admissible). 

So B contains 2^ — h — 1 "other" vectors. To estimate how many of these 
are admissible, we simply multiply 2^ - /i - 1 by the ratio ^'^"lif^ectorf"'"' = i- 
Therefore, our average contains h + (2^* — h — 1)^ admissible vectors □ 

Theorem 3.19. Let c — 2^, letn — cb {n>k> 1) and z — dimpj ((Im(£))) . 

(1) The number of {k x n)-matrices having rank k can be estimated by the 
following formulas 



p{k,k)= n {c'-c{t-i)) 



l<i<k 



I.e. 



p(l,l) = c^ p{2,2)^c\c'-l), p{3,3)^c\c'-l){c'-2), 
p{Kk)^c\c'-l){c''-2) II h-{t-l)-{T-'-z)^), k>A 

4<i<k ^ ^ 

(2) The number of {k x n)-matrices having rank k — 1 can be estimated by 
the following recursive formula 

p(2, 1) = c^ p(3, 2) = p(2, 2)e(2) + p(2, l)(c^ - ^(1)) = 3c\c' - 1), 

p(4,3)=p(3,3)e(3) + p(3,2)(c''-e(2)) 

p{k,k-l) = p{k-l,k-mk-l)+p{k-l,k-2){r-2'^-')^^, k>5, 

Proof. (1) In order for a (/c x n)-matrix to have rank k, the rows must 
be linearly independent. The first row can be any vector having weight 
1 in any of the b component. There are such vectors, so p(l, 1) = 
(i.e. c'^ is the total number of the admissible vectors). The second row 
must be independent of the first row. That means it cannot be equal 
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to the first row. There are (c^ — 1) choices for the second row and thus 
p(2, 2) = c''(c'' — 1). The third row cannot be equal to one of the previous 
rows. But also, in our representation, it is impossible that two admissible 
vectors add to another admissible vector. Then we have {c'^ — 2) choices 
for the third row, so p(3, 3) = c^{c^ — l){c^ — 2). 

On the other hand, if we add three or more admissible vectors we may 
get another admissible vector. As a consequence, if we are considering 
the i-th row, we must discard on average ^{i — 1) vectors and so we can 
choose only among c'' — ^(i — 1). 

(2) The set of the {k x n) matrices having rank exactly A; — 1 is the disjoint 
union of two sets: 

a) those having the first k — 1 rows linearly independent (and so the 
k-th row dependent on the previous k — 1 rows); 

b) those having the first k — 1 rows linearly dependent (and so these 
rows have rank k — 2 and the k-th row is independent from them). 

Therefore, the number of {k x n) matrices having rank exactly A; — 1 is 
obtained adding the following two values 

a) the number of (/c — 1) x n matrices having rank k — 1 multiplied by 
the number of all possible choices for the dependent row. 

b) the number of (A; — 1) x n matrices having rank A; — 2 multiplied by 
the number of all possible choices for the independent row. 

• The number of (A; — 1) xn matrices having rank A; — 1 is p{k — l, k — 1), 
for A; > 2. In case A; = 2, we have p(l, 1) ~ c^. 

• The number of all possible choices for the dependent row is ^(A; — 1) 
for A; > 2; if A; = 2, the possible choice is exactly one, since the only 
second row we can choose is the first rows. 

• The number of (A; — 1) x n matrices having rank A; — 2 is p{k — 1, A; — 2) 
and it makes sense for A; > 3. When k — 2 we have to consider a 
matrix having exactly one row and with rank 0, so it is the zero row, 
but the zero row is not an admissible vector. In other words, when 
we have only two rows, the set in b) is empty. In case A: = 3, we have 
p(2, 1) = c^, since the second row has to be equal to the first one. 

• The number of all possible choices for the independent row is 

(2^ — 2'^~^)(|j) and it is true for k > 5. For A; = 3, we must choose 
a third row different from the first two. The first two are equal and 
so we have c'^ — 1 choices. For A; = 4, we must choose a fourth row 
outside the space generated by the first three, but only two of the 
first three are distinct ans so we have — 2 choices. 

Putting altogether we obtain our formula. 



□ 
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3.3 Application to AES 

Because of the AES structure, we assign the following values to the pa- 
rameters we have previously introduced. Let V = {¥2)^ be our starting vector 
space with r = 128 and W = (F2)^ s > 128. We need to estabhsh s. We 
consider the quotient F256 — IF2[a:;]/ (m), where m = + x'^ + + x + l G F2[a;] 
is the AES-polynomial. So m = 8. According to the previous section, we con- 
sider e' : F28 — )■ (F2)^^^ by means of a primitive element 7 of F256, which is a 
root of the primitive polynomial^ n = x^ + x^ + x^ + x"^ + 1 G F2[a;], and we 
define our parallel map e : V ^ W, with r = mb = 128 and s = T^h = 4096, 
as 

€{vi, . . . , Vie) = (e'ivi), . . . , e'ivie)). 

We have that dimf^ ((Im(e))) = 4081, by Proposition 3.13. 
A tipical round function of the AES cryptosystem consists of the composition 
of two parallel maps (Add Round Key and SubBytes operations) and two non- 
parallel maps (ShiftRows and MixColumns operations). We view the SubBytes 
(and Add Round Key) operation as a parallel map vr 

tt: (F2s)i6^(F28)i6 

(1/1, ■ ■ ■ ,1/16) ^ MVl), ■ ■ ■ ,7ri6(?/l6)) 

where i/i G F28 and vTj G Sym(F256), for 1 < i < 16. In the SubBytes case, 
each component VTj, where 1 < i < 16, is composition of inversion opera- 
tion and an affine map; in the Add Round Key case, we have a sum with the 
round- key. By the Theorem 1.5 we recalled in the first section, we have that 
Sym(F256) = {ax + b, x^^"^), where a, 6 G F256. We note that a parallel map can 
be linearized using elementary results from Representation Theory. 

Moreover, we claim that ShiftRows is linear over [¥2)'^^^^ and that Mix- 
Columns is not linear over {¥2)^^^^, as follows. 

First of all, we recall the map that describes the ShiftRows operation: 
sh : (F28)i6 ^ (F28)i6 

(yi,y2, • • • ,yi6) ^ (yi, ye, 2/11,2/16,2/5, 2/10,2/15,^4,^9, yu, 2/3,^8, yi3,y2,y7,yi2)- 
Denoting by y = (?/i, ■ ■ ■ , yie), we note that 

^(y) = (^'(yi),^'(y2),e'(z/3),^'(i/4),e'(i/5),--- ,£'iyi6)) 

and 

e(sh(y)) = {e'{yi),e'{yfi),e'{yu),e'{yi6),e'{y5),--- ,e'{yi2))- 
The map sh is linearly extendible because X]je/^(^*) ~ clearly implies the 
following equality Xlie/ ^(^K^*)) = 0- 

note that n 7^ m; we could not use m because it is not primitive. 
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According to Lemma 3.6, it is possible to construct the hnear map 

Ash : m'''' ^ {¥2^'' 
and so the ShiftRows operation is hnear over (F2)^'^^^. 

Now, we show that the MixColumns operation is not hnear over (¥2^^^^ 
using the following counterexample. 

Example 3.20. Let Wi, W2, w^, W4 G W such that Wi + W2 + W3 = W4: 

w, = (£'(7^), ^'(y ), ^'(0), ^'(0), e'(0), • • • , £'(0)) 
W2 = ), £'(0), £'(0), £'(0), • • • , £'(0)) 

^/;3 = (£'(0),£'(0),£V),£'(0),£'(0),---,£'(0)) 

^4 = (£'(0),£'(y),£'(0),£'(0),£'(0),---,£'(0)). 

Now, we apply the MixColumns operation MC to each vector twi, u»2, ^3, 
obtaining the following 

MC'K) = (£'(y ), £'(7=^), £'(0), £'(f ^), £'(0), • • • , £'(0)) 
MC{W2) = (£'(7=^), £'(7^^), £'(7=^), £'(7^^), £'(0), • . • , £'(0)) 
MCM = (£'(y),£'(7=^),£'(7^^),£'(y),£'(0),--.,£'(0)) 

where 



MC 



Then we have that MC(wi) + UC{w2) + MC(w3) is 

(£V),^V),£'(o)+£'(f)+£'(f'),£'(y),£'(o),---,£'(o)). 

The third component of the previous vector is a sum in (F2)^^^ and it has 
weight equal to 3. So, the vector MC{wi) + MC'(u'2) + MC'(u'3) is an element 
of the admissible space but it is a non- admissible vector. 
Therefore, MCiw^,) = MC{wi + W2 + w^) ^ MC'(wi) + MC'(w;2) + MCiw^) 
and so the MixColumns is not linear over W. It means that the extension of 
MC is not linearly extendible. 

Remark 3.21 . If all the AES operations were parallel maps, it would be possible 
to linearize the "full" cryptosystem because the set of the parallel maps is a 
group with respect to the composition operation. 
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3.4 Application to PRESENT 

As for AES. wc assign the right values to our parameters, according to 
present's structure. Let V = {¥2)^' be our starting vector space with r = 64, 
and W = with s > 64. We consider s' : F24 (¥2)^^ and we define our 
parallel map e : V ^ W, with r = mb — 64 and s — — 256, as 

e{vi, . . . , Vie) = (^'(^^i), ■ ■ ■ , ^'(^^le))- 

We note that dim^j ((Im(£))) = 241 (see Proposition 3.13). 
A typical round function of the PRESENT cryptosystem consists of the com- 
position of two parallel maps (addRoundKey and sBoxLayer operations) and 
one non-parallel map (pLayer operation). The addRoundKey (and sBoxLayer) 
operation is a parallel maps tt 

tt: (F24)^6^(F24)i6 

(*!,•• • ,*16)H^(7ri(ti),-- - ,7ri6(tl6)) 

where ttj G Sym(Fi6). In the sBoxLayer case, each component ttj (1 < i < 16) 
is given by the table in Subsection 1.7; when tt is the addRoundKey operation, 
we have only a bitwise sum with the round-key. 
Moreover, it is easy to see that pLayer is not linear over (F2)^^^. 

Example 3.22. Let wi, W2, W3, W4 E W such that wi + W2 + W3 = W4 and let 
C, 77, 1^ be distinct non-zero elements in F24. Suppose that 

^/;i = (£'(C),£'(C),^'(0),e'(0),£'(0),---,£'(0)) 
«;2 = (^'(C),^'(0),5'(C),^'(0),5'(0),---,5'(0)) 
^3 = (^'(0),£'(0),£'(C),e'(0),£'(0),---,£'(0)) 
^/;4 = (£'(0),£'(C),£'(0),£'(0),£'(0),---,£'(0)). 

Now, we apply the pLayer transformation pL to each vector wi, ^2, 1^3, 
obtaining the following 

pL'(i^i) = {e\^),e'{Q)„ e\^),e\(i)„ e\^),e\(i)„ e\^),e' 

pL'(^2) = (^'(^), ^'(0)3, ^'(^), ^'(0)3, ^^'(^), ^^'(0)3, e'{d),e'{Q)^) 

pV.\w,) = (e'(0, ^'(0)3, ^'(0, ^'(0)3, ^'(0, ^'(0)3, ^'(0, ^'(0)3) 
pL'(«;4) = (£'(//), £'(0)3, £'(//), £'(0)3, £'(//), £'(0)3, £'(//), £'(0)3) 

where £'(0)3 means (£'(0), £'(0), £'(0)). Then, we have that 

pL'(u;4) = pL'(u;i+'u;2+'u;3) 7^ pL'(u;i)+pL'(u;2)+pL'(u;3) = {e'{rj)+e'{-&)+e'{0, ■ ■ •)> 

where the first component has weight 3, and so the pLayer is not a linear 
operation over W. 
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Remark 3.23. As in the AES case, if all the PRESENT'S operations were 
parallel maps, it would be possible to linearize the "full" cryptosystem be- 
cause the set of the parallel maps is a group with respect to the composition 
operation. 

3.5 Application to SERPENT 

Let V = (F2)'' be our starting vector space with r = 128. In order to 
identify the value of r > s, where W = (¥2)^ , we have to consider the map 

e' : (F24) ^ (F^)^'. 

We define our parallel map s : V ^ W with r = mb = 128 and s = 2"*6 = 512 
as 

Note that dimF2((Im(e))) = 2™6 - (6 - 1) = 481. 

The components of a typical round function are the parallel 5'-box, the affine 
transformation described in Subsection 1.6 and the translation with the round 
key. Obviously, key translation and S-hox are parallel maps of type 

tt: (F24)32^(F24)32 
{tl, ^32) ^ (7ri(tl), . . . , 7r32(t32)) 

where tTj G Sym(F24). 

Similarly to what was done for AES and PRESENT, we could provide a 
counterexample to show that the linear transformation of SERPENT is not 
linear over (F2)^^^. 

4 Results on a Icirger embedding 

In this section we provide another specific embedding that can be seen as 
an improvement of the former (2). Also the new embedding can be applied to 
AES, PRESENT and SERPENT. In Subsection 3.2 we considered = F as a 
vector space and we found an embedding V such that the 5'-boxes and 

the key-additions become linear. However, in this way wc lost the linearity 
of the Mixing Layer A and so here we make a larger embedding where the 
linearity of A is recovered, without losing the linearity of the key addition. We 
do lose the linearity of the S-hoxes, but their non-linearity is probably kept 
low. 

Starting from the setting we described in the previous section, we con- 
sider our parallel map e : (F2m)^ — >■ ((F2)^'")'' defined as e{vi, . . . ,Vb) — 
{e'{v,),...,e'{v,)). 
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Now, let M be a matrix in GL((F2)™'') and let t be its order, M* = idy 
Let V — {¥2y be a vector space with dimension r = mb and let W — (F2)* be 
the vector space with dimension s = T^ht. The space embedding a : V ^ W 
we propose in this section is defined as follows 

a{v) = {e{v),e{Mv),...,e{M^-^v)). (5) 

From now on, a denotes the map in (5). Thanks to Proposition 3.13, we can 
easily prove the following proposition: 

Proposition 4.1. Let V = (F2)^ be a vector space with dimension r = mb 
and let W = (F2)* be the vector space with dimension s — 2"^bt. Let a be as 
in (5). Then we have 

T^b - (6 - 1) < diniF^ ((Im(a))) < (2"^6 - (b - l))t 
Proof. By Proposition 3.13, dimpj ((Im(£))) = 2"*6 - (6 - 1). Since 
{{e{v),e{M.v), . . . , ^(M*-^^;)) \veV}(l {{e{v^), . . . , e{vt)) \v^,...,vte V}, 
then 

dimp, ((Im(a))) < (2™6 - {b - l))t. 

On the other hand, considering the projection of {{e{v) , e{Wlv) , . . . , £(M*~^v))} 
on the first component (the first b bytes), the lower bound follows immediately, 
again considering Proposition 3.13. □ 

We can further improve Proposition 4.1 for byte-oriented Mixing Layer. 

Proposition 4.2. Let V = {¥2)^ be a vector space with dimension r = mb 
and let W = (F2)* be the vector space with dimension s = 2'^bt. Let M £ 
GL((F2m)^). Let a be as in (5). Then we have 

dimp^ ((Im(Q;)) < T^bt - (bt - 1) - mb(t - 1) 

Proof. Let T — (Im(Q;)). For any u»i, W2 e W, let Wi • W2 denote their scalar 

product. It is sufficient to show that there exist {bt — 1) + mb{t — 1) elements 
in T-^ that are linearly independent, where T"*- = {w & W \ w ■ t = 0,\/t E T} 
is the orthogonal space of T (or the "dual" of T, in coding theory notation). 
In fact, this means 

dimT^ > {bt - 1) + mb{t - 1) 
and since dimT = dimPF — dimT-*-, our result could follows. 
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Consider the following matrix product with M = (oij) 



Oil Ol2 

O2I O22 



026 



y O6I 062 



V2 



abb J \Vb J J 



Obviously, v[ = XlLi^**^!*- 

Let 5" be a subspace of (F2)™ such that dim(5") = m — 1. For any 1 < i <b, 
let Si — {/3e (F2m) I /3aii e 5"}. We note that Si is a subspace and that 





I ^ Vittii \ Vi e Si, 1 < i < ^ S' 



and that \Si\ = \S'\ = 2"^ ^. There exists a bijection via orthogonality between 
the sets S^{S < (Fa)"*! dim{S) = m - 1} and {S^ < (Fa)"*] dim(5^) = 1}; 
their cardinality is obviously 2™ — 1. We can choose a linear basis for S U {0}, 
i.e. 5U{0} = {ef,...,e:^). Therefore, each row of M generates m linearly 
independent elements of T"*-. 

Two relations coming from two different rows are independent, since the ma- 
trix M has full rank, for a total of mb relations. 

Now, we construct the elements of the orthogonal space that correspond 
to the relations induced by the rows of M. We are considering the case (v, Mv) 
and we observe that 



E 

i=l 



Vittii 



{Mv) 



(6) 



where 



where Vi G Si. Since e'{Si) C (Fa)^™, we consider = ^ 
w{wi) = \e'{Si) \ = 2"*-^ and Wi e (Fa)^'". The element of coming from (6) 
and S is 

{wi, . . . ,Wb,w[,. . . ,0,. . .0) 
where w[ = X]^e£'(5') ^- Clearly, m — 1 similar elements come from (6) and S. 

If we consider the relations given by the h-th row of M, i.e. Yl'i.=i "^iahi = v'^-, 
we obtain the following elements 

(wi,...,'u;6,0, 0,...0). 

At this point, we have constructed the mb elements of the orthogonal space 
corresponding to the previous relations. 

Instead of considering (f,Mf), since clearly M(M*-?;) = M*+^t;, we can ap- 
ply the previous construction to each pair (M*^, M^+^v), for 1 < i < t — 2, 
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obtaining the corresponding elements 



(0^^^^^, Wi,..^.,Wb , 0, ■ ■ ■ , Wfe, ■ ■ ■ , , 0^_^^^ 

b{i-l) b b bt-{i+l)b 



(7) 



We have found exactly mb{t — 1) vectors in T"-*-. Since the pairs (M*v, M.^'^^v) 
and (M-'f , M''"'"^^) with i ^ j involve different bytes, the relations given by 
(M*f , M^^-'^f ) arc independent from those given by (Wlhj.'M.^^^v). Then we 
have mb{t — l) independent relations (i.e. linearly independent elements of the 
orthogonal space). 

Thanks to Proposition 3.13, we have exactly {bt — 1) further relations, 
corresponding to elements in T-*- of type 



(8) 



with l<k<bt. 

The vectors (7) and (8) form clearly a linearly independent set. □ 

As we have done in previous section, we can construct the following matrix. 

Let P be a subset of the plaintext set Ai such that = dimpj ((Im(Q;))). 
Let ai e V, 1 < i < \V\. We construct the {\V\ x 2™6t)-matrix D such that 
the i-th row is the image of the map a applied to the plaintext & V, for 

ie{l,---,|P|}: 




D = 



Q;(a^) 



e{a^) e{Ma^) 



Remark 4.3. We expect the rank of this matrix to have a behavior similar to 
that of matrix H (3), see Remark 3.17. Our experiments confirm this. 

Let Q be the set of parallel maps tt : (¥2-^)'^ {¥2^)^, such that, for any 
^ ^ j ^ b, 7tj{x) = ax + c, with a 7^ 0, c G ¥2^ [a and c do not depend on j). 
Let Q be the set of parallel maps n : (F2m)^ — )■ (¥2^.)^, such that, for any 
1 < J < ^, ^i(^) — X + dj, with dj e F2m. 

Note that both Q and Q are subgroups of Sym((F2m)'') and we define G as 

g ^ (^g,g,M^ <Sym{{¥2r.f). 

The following result holds: 

Proposition 4.4. Let a be either an element of Q or an element of Q , then 
there exists A„ -.W ^ W which is linear. 
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Proof. We want to apply Lemma 3.6 and so we must only show that a is 
linearly extendible. Let {v'^}iei C V such that J^iei'^i'^^) — 0) '^^ hme to 
prove that X^jgj a{a{v'^)) — 0. Note that a{v'^) = is equivalent to 

J2{^'{v%,---,^\v%.^\^v%---,e\Mv%,...,s\M'-\%...,6'{M'-\%)=0. 
I 

Then we have the following system Sj for any I < j <b 

lE/£'((M*-V),) = o. 

Using Lemma 3.12, we have that 5"^ is equivalent to Sj 



\{( \ Vj = Vj}\ is even E I 

\{£ I {Mv^)j = {Mv')j}\ is even Vi e / 

J{£ I {M^-^v^)j = {M^-^v')j}\ is even Vi e /. 



Suppose a E Q which means that a{v) = (t{vi, ■ ■ ■ ,Vb) = {ai{vi), • • • , ab{vb)) 
where ai{vi) — avi + c for any 1 < i < b and a ^ 0,c E ¥2m. 
Since M is hnear, we have 

{M^a{v^))j = {M^avi + c,--- ,avi + c))j 
= (aMV + M^(c,--- ,c))j 
= (aMV), + (M^(c,--- ,c)),- 



: a{M''v^)j + c, 



(MV)JI 



where c is a constant independent of i. 

We have that, Vi e / and for any 1 < /i < t - 1, |{£ | (M^t;^) 
is even and so that \{i \ a{'M.'^v^)j + c = a{'WL^v^)j + c}\ is even. Thanks to 
Lemma 3.12, our thesis follows. 

Suppose now that a E Q, i.e. a{v) — v + d ior some d eV. Since 

(M^a(vO)i = (M''(ai;^ + d))j 

= (MV), + (M''(d)), 
= (M^wOi + d, 

where d is a constant independent of £ and \{£ \ (MV)^ = {M''v')j}\ IS even, 
we have that 

|{£|(MV),+J=(MV), + ci}| 
is even. By Lemma 3.12, our thesis follows. □ 
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4.1 Application to AES 

Let V = (F2)'' be a vector space with dimension r = 128 and let M : 
V ^ V he the MixingLayer of AES, that is, the composition of ShiftRows and 
MixColumns. Since M has order equal to 8 (i.e. = idy), the map a : V ^ W 
we propose is defined as follows 

a(v) = (s(v),s(Mv), e(M'^v)), (9) 

where W = (F2)'* is the vector space with dimension s = 2"^bt = 2^^ and e is 

the map defined in Subsection 3.3: e : {¥2)'^^^ (¥2)^^^^. 

Let T — (Im(Q;)) with a in (9). We can easily determine dim(T'). 

Fact 4.5. In the AES case we have 

dim^,{T) = 2"^6i - (bt - 1) - mb{t - 1) = 31745. 

Proof. Let A = 2™6t — {bt — 1) — mb{t — 1). By computational experiments, 
we have found a (A x 2™6t) full rank matrix for the a representation in the 
AES case. Which means dimpj T > A. Thanks to Proposition 4.2 we conclude 
that dimFa T — X. □ 

We note that the group 

g = (^g,g,M^<sym{{¥2sy^). 

contains all the permutations of the AES-round function, except notably for 
the yS'-box operation. 

Proposition 4.6. Let M be the MixingLayer. Then a is a space embedding 
with respect to g = (g, Q, M^. 

Proof. According to Proposition 4.4, there exists a linear map A„ : W ^ W 
in case cr is ^ or Q. We note that the previous result is independent from M. 
Let M be the MixingLayer M. Since a{v') = {e{v'),e{Mv'^), . . . ,£(MV)) and 

a{Mv') = {e{Mv'),e{M\'), £(MV)) 
^{e{Mv'),e{M''v'),...,e{v')) 

a(Mf*) is a permutation of a{v''). Obviously, we have that X^jg/Q;('y*) — 
implies J2iei aC^v*) = 0. □ 

With a fixed K, the encryption (px is the composition of Add Round Key, 
Subbytes and MixingLayer. So the only part of 0x which is not linear (with our 
map a) is the SubBytes operation. 
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4.2 AppUcaUon to PRESENT 

Let V = {¥2)^ be a vector space with dimension r = 64 and let M : V ^ V 
be the pLayer of PRESENT. Since = idy, the map a : V ^ W we propose 
is defined as follows 

a{v) = {e{v),e{Mv),e{M%)), (10) 

where W = (F2)* is the vector space with dimension s = T^ht = 768. Let a 
be as in (10) and T = (Im(a)). Also in this case it is possible to prove (with 
a computation) that dimir2(T) = 2"^bt — {ht — 1) — mh{t — 1) = 593 

With a fixed K, the encryption (px is the composition of add Round Key, 
sBoxLayer and pLayer. So the only part of ipK which is not linear (with our 
map a) is the sBoxlayer operation. 

4.3 Application to SERPENT 

Let V = (F2)'' be a vector space with dimension r = 128 and let M : V ^ 
V be the affine transformation of SERPENT. Since the order of M is greateiEEl 
than 2^^^, it is huge and impractical to consider the map a : V ^ W 

a{v) = ieiv),e{Mv),...,eiM''\),...). (11) 

since W = (F2)' would have s = 2"'bt > 2^ ■ 32 ■ 2"*^ = 2^25^ making the rank 
computation impossible with nowadays technology. 

5 Attack strategies 

In this paper we do not report on successful attacks on (full versions of) the 
AES or other well-known ciphers. It is true that we have implemented several 
attacks aiming at distinguishing AES from random permutations, presented in 
some talks, and that we have collected some data indicating that our approach 
is likely to succeed. Yet, our data do not provide an overwhelming statistical 
evidence for the full cipher versions. Therefore, in this section we sketch some 
attack strategies that we have followed, without giving full details. 

The most difficult task in assessing the success of one of our embeddings 
is, by far, to estimate the non-linearity decrease of the cryptosystem. For 
example, a rigorous determination of the s-extendibility (Subsection 6.1) ap- 
pears completely out of reach. The only methods we can use to estimate the 
non-linearity fall are "a posteriori" checks on linear dependences. 

We have implemented only chosen-plaintex attacks, either with single-key 
or with related keys. In the single-key scenario, we proceed in three steps: 



^1 to be precise it is 110329570561973845861261474090270635, as computed directly 
with MAGMA. 



34 



A possible intrinsic weakness of AES and other cryptosystems 



(1) we choose a set S of N (31745 x 2^^)-matrices, with rows taken from T 
(Fact 4.5); 

(2) we encrypt all matrices in S (row by row) with a given key and compute 
their ranks; 

(3) we compare their rank distribution with the expected rank distribution 
for a set of N random (31745 x 2^^)-matrices, with rows taken from T, 
aiming at distinguishing the two distributions; 

(4) to validate the distinguishing statistical test, we also create sets of N 
random (31745 x 2^^)-matrices (in T) and we compare them with the 
expected distribution, aiming at not distinguishing them. 

In the related-key scenario we proceed similarly. Let Uk be the number of 
related keys: 

(1) wc choose a set S* of (31745 x 2^^)-matrices, with rows taken from T; 

(2) we encrypt all matrices in S (row by row) with all keys and compute 
their ranks; 

(3) we compare their rank distribution with the expected rank distribution 
for a set of NNk random (31745 x 2^^)-matrices, with rows taken from 
T, aiming at distinguishing the two distributions; 

(4) to validate the distinguishing statistical test, we also create sets of Nrik 
random (31745 x 2^^)-matrices (in T) and we compare them with the 
expected distribution, aiming at not distinguishing them. 

Apart from the obvious difference in the dealing of the single- key/related- 
key mechanism, the two scenarios are very similar, since in both we hope to 
spot a significant deviation by looking at ranks. Matrix ranks do depend on the 
linear dependences of the rows and are much easier to compute and compare, 
so they are cheap indicators for the non-linearity behavior (see Marsaglia's 
test, e.g. [Sot98],[NIS00]). 

On the other hands, since a great deal of row dependences influence the rank, 
as indicators they are noisy and force us to collect a huge number of samples. 
To maximize the effect on the rank of our embeddings, we need to choose S 
with a very specific rank distribution, e.g. with matrices of extremely low rank 
(while keeping all rows distinct). 

A report on some experimental results can be found in [RSBIO]. 

6 Further remarks and other results 

The first subsection contains some results on how our representation could 
achieve a weaker notion of linearity. 

In Subsection 6.2 we report other thinkable representations, that unfortu- 
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nately are impractical. The main objective in these constructions is to identify 
the right compromise between computational feasibility and quantity of infor- 
mation that can be obtained. 

Then, in Subsection 6.3 we prove the fact, using classical and easy argu- 
ments, that it is unlikely to embed the AES cipher into a linear cipher, unless 
one uses a huge- dimensional vector space (and so this embedding is useless in 
practice). 



6. 1 On a weaker notion of linearity 



The results in this section are jointly with L. Maines and the proofs are 
contained in her Master's thesis [Mai09] (see also [MRSIO]), supervised by the 
second author. 

The main goal sought in Section 3.1, Section 3.2, Section 4, and Section 6.2 is 
to find practical embedding of (F2)^^^ into a larger space where all components 
of the round function become linear. This is impossible, as shown in Section 
6.3, but what we achieve in Section 4 is an embedding where the non-linear 
maps are "not so far" from linear maps. There are many notions of "non- 
linearity" , but none of them can be easily computed in our setting. When we 
say "not so far from linear" , we mean that these functions behave with matrix 
ranks in a way similar to that of linear maps, as discussed in Section 5. 

However, we have been able to introduce a new non -linearity notion, that 
we call s-extendibility (Definition 6.1). We are not able to apply it in the 
embedding 

a:v^ {e{v),£(Mv), ■■■ , eiM^v)). (12) 
to 



but we can apply it 



12 



a : V {e{v),e(Mv)). 

and so our definition and our results (the main results of this section is Theo- 
rem 6.6) should be seen as a step forward the complete understanding of the 
surviving non-linearity in (12). 

Definition 6.1. Let V = (Fs)'^ and W = {¥2)', with s > r. Let a e Sjm{V) 
and a be an injective map a : V ^ W . We say that a is s-extendible (via 
a) ifW{v^}i<h<s C V we have 

h=l h=l 

Remark 6.2. If = v"^ and = v^, then Va and Vcr we have 

a{v') + a{v^) + a{v^) + a{v^) = 



12 



under specific conditions on M 



36 



A possible intrinsic weakness of AES and other cryptosystems 



and 

a{a{v')) + a{a{v^)) + a{(T{v^)) + a{a{v'^)) = 0. 

So if we test the 4-extendibility of a only on these sets of vectors, we will find 
that any a is 4-extendible. We call these vectors "coupled vectors" . 

We note that if a is s-extendible Vs e N, then a is hnearly extendible, ac- 
cording to Definition 3.4. Moreover, any linear map is s-extendible for all s. A 
random map is a 2-extendible but (with high probability) it is not s-cxtcndible 
for any s > 4. Therefore, any 4-extendible map can be considered closer to a 
linear map. We would like to have results on our embedding concerning the 
s-extendibility of maps. A first result in this direction is obtained using the 
space embedding 



a{v) = {s{v),s{Mv)), (13) 
where M is a (n x n)-matrix with entries in F2m, as we are going to explain. 

Definition 6.3. Let i,j,x,y,a, (3, • • • e and M an (n x n)-matrix with 
entries in 

( ^ _ \ 



M 



mil "^12 . . . mi„ 
m2i m22 



\mn\ Vdnn ) 

The vectors wi,W2,W3,W4 G (¥2^)^"' are 4-related vectors if they can be 
permuted in order to have the following form: 





1 2 . 


n + l 


2n 




Wi, 


(i, X, . 


. mui + mi2X + . . . , 


. . . runii + mn2X + . 


•) 


W2, 


{i, y, ■ 


. mui + mi2y + ■ ■ ■ , 


... m„ii-Fm„2y + - 


■) 


W3, 


U, X, . 


■ muj + mi2X + 


■■■ mni3 + mn2X + . 


■■) 


W4, 


(i, y, ■ 


■ muj + mi2y + 


■ ■ ■ runij + mn2y + • 


• ) 



Four related vectors Wi, . . . , ^4 are admissible vectors a{vi) = {6{vi),6{M.vi)), 
a{v2) = {e{v2),e{Mv2)), aiv^) = {e{v3),e{Mvs)), a{v4) = {e{v4),e{Mv4)) 
such that 

e{vi) + e{v2) + e{vs) + e{v4) = , 

but we do not know the sum £(Mvi) -|- £(M^;2) -|- £(Mi>3) -|- eCMvi). 

Let (T be a parallel maps over (F2m)^". The image of 4-related vectors via 
a can be seen as 
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1 


2 . 


n + l 


2n 




1 Wl, 


(^», 


cr(x), . 


■ iTiiia{i) + mi2cr(.x) + . . . , 


■ ■ ■ rnnicr{i) + mn20-(x) + . 


■) 


2 wl, 




■ 


. miia{i) + mi2cr(y) + . . . , 


■ ■ ■ rnnicr{i) + mn2cr{y) + ■ 


•) 


3wl 




a{x), . 


• rniia{j) + mi2cr{x) + 


■ ■ ■ rnnia{j) + mn2(T{x) + . 


••) 


4 wl, 






■ rnua{j) + mucr^y) + 


■ ■ ■ runiaij) + mn2cr{y) + ■ 


•) 



Definition 6.4. A-related vectors Wi, . . . ,104 are totally A-related if 

+ ^2 + + ^4 = 0. 

Definition 6.5. Given {x,y, z,a,b,c) e and an {n x n)-matrix M, we 
say that {x, y, z, a, b, c) fits M if the following sums of elements o/det(M) are 
non-zero: 

• the sums having a number of elements equal to 

i (" " 1 (:: !) C ± (" - ") (:: : t) i: : s (" r) (: : :) e : >■ 

when z = Q,x ^ Q,y =^ when y = 0,xj^O,zj^O when x = 0,y ^ Q,z ^ Q 

• the sums having a number of elements equal to 

fig("7l(::!)Cr)C%"-;')C""r'"0*' 

when X ^ 0,y ^ 0, z ^ 0. 

The main result of tliis section is tlie next tlieorem tliat gives sufficient 
conditions on M in order to make all cr : y ^ y into 4-exendible maps. 

Theorem 6.6. Let M be an {n x n)-matrix, with entries in F2m such that: 

(1) det(M) 7^0; 

(2) all the k x k minors are non-zero (0 < k < n); 

(3) all sextuple {x,y, z,a,b,c) such that 

• < a, b, c < n; 

• a -\- b -\- c — 2n; 

• a >b > c; 

• ^ x,y, z < n; 

• X + y + z = n; 

• X < a, y < b, z < c; 

fit M. 

Then any A-related vectors are totally related if and only if they are coupled. 
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Thanks to Theorem 6.6 and Remark 6.2, we have the following 
Corollary 6.7. In the hypothesis of Theorem 6.6, any map is A- extendible. 

6.2 Other embeddings of this kind 

We can also build other embeddings similar to those described in previous 
sections. The main objective in these constructions is to identify the right 
compromise between computational feasibility and quantity of information 
that can be obtained. In Section 3.2, we constructed the embedding e that 
has been useful to make linear the S-hox maps which are the classical non- 
linear maps of a cryptosystem. We had to abandon the linearity of MixColumns 
(for AES) and the pLayer (in case of PRESENT). In order to use some more 
information about the MixColumns (or the pLayer for PRESENT), we have 
considered the embedding given in Section 4: 

a{v) = {e{v),e{Mv), e{M*-\)), 

where M is the full Mixing Layer. The strength of this enbedding is that we 
can exploit the low order of M to force the linearity of M. The disadvantages 
are that we have lost some computational efficiency and that the S'-box is 
non-linear again (but with a lower non-linearity). 

For AES, we considered also the embedding given by 

a{v)^{e{v),s{MC{v)),...,e{MC'{v))), 

since the order of the MixColumns is equal to 4 and the MixColumns operation 
was the only to be non-linear in Section 3.2. Unfortunately, in this context 
both the ShiftRows and the parallel maps are non-linear and so we put aside 
this idea. 

Although the following two embeddings could provide a lot of information 
about a cryptosystem, 

• a{v) = {e{v),e{{MoShox)v),...,e{{MoShoxy-^v)) (t = o(Mo Sbox)) 

they are very impractical, since the order of (MoSbox) and of {■yXak) is huge. 

6.3 On complete linearizations of AES 

Let C be any block cipher such that the plain-text space Ai coincides with 
the cipher space. Let /C be the key space. Any key /c G /C induces a permuta- 
tion Tfc on A^. Since A4 is usually V = {¥2)"" for some n G N, we can consider 
Tfe G Sym{V). We denote by F = F(C) the subgroup of Sym{V) generated by 
all the Tfe's. Unfortunately, the knowledge of F(C) is out of reach for the most 
important block ciphers, such as the AES [NatOl] and the DES [Nat77]. How- 
ever, researchers have been able to compute another related group. Suppose 
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that C is the composition of I rounds (the division into rounds is provided in 
the document describing the cipher) . Then any key k would induce I permuta- 
tions, Tfej, . . . , Tk^i, whose composition is r^. For any round h, wc can consider 
Th{C) as the subgroup of Sym(y) generated by the Tk^hS (with k varying in 
/C). We can thus define the group = ^oo{C) as the subgroup of Sym{V) 
generated by aU the F^'s. Obviously, F < F^o • Group F^o is traditionally 
called the group generated by the round functions with independent sub-keys. 
This group is known for some important ciphers, for example we have 

Proposition 6.8 ([SW08],[Wer02]). 

F,,(AES)=Alt((F2)i2«). 

It is very likely (and it is common belief among researchers) that Tabs — 
Foo(AES) = Alt((F2)-'-^^). Assuming this, we discuss in this section the possi- 
bility of viewing Tabs as a subgroup of GL(K) with V of small dimension. In 
Cryptography it is customary to present estimates as powers of two, so our 
problem becomes to find the smallest £ such that Tabs can be linearized in 
GL((F2)2'). A classical proof is given in [Wag76] that i = 128. We feel de- 
sirable to obtain a result with a simpler proof. Our estimate is weaker than 
Wagner's, but strong enough to show the linearization infcasibility. 

There are two obvious ways to show that a finite group A cannot be 
contained (as isomorphic image) in a finite group B. The first is to show that 
\A\ > \B\, the second is to show that there is rj e A such that its order is 
strictly larger than the maximum element order in B. Subsection 6.3.1 presents 
our result using the first approach and we show that i > Q7, which is more than 
enough to ensure the infcasibility of the linearization attack. This subsection's 
argument is completely elementary. Subsection 6.3.2 present our result using 
the second approach and we show again that £ > 67. It is interesting that, 
although here some more advanced argument is needed (results in number 
theory), we reach the same estimate. 

6.3.1 First approach 

In this subsection we show that the order of Alt((F2)^^^) is strictly larger 
than the order of GL{V), with V = {¥2^ , so that £ > 67. 

We begin with showing a lemma. 

Lemma 6.9. The following inequality holds 

2(2^)" < 2^28! < 2(2')'°. 

Proof. Let n = 2^, we have to show 2"^^ < 2"^! < 2™'°. We first show that 
2" < 2"-\. The following inequality holds for 1 < i < n — 2 and 1 < h < 2**"' 



1 1 

> 

2n~i - 2n-i+i - h 



(14) 
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Clearly 

1 Q 1 Q 

T\ > 2" <^ 2''{2'' - 1)! > 2^* • 2" " 

on 1 

^ (2'* - 1)(2" - 2)! > 2"''-" • — . 

We apply (14) with i — 1 and h — 1 and so we must prove 

on 1 

(2"-l)(2"-2)!>2""-".^^, 

i.e. (2" - 2)! > 2"''-"-("-i). We use the same inequahty for a\\ 2 < h < 2"-^ 
and wc obtain that we must verify (2"^~^ — 1)! > 2" -"-2" (n-i)^ Then we 
proceed by applying (14) for all 2 < i < n — 2 and all 1 < /i < 2"~*, so that 
we need only to prove 



In other words, we have to prove 



n-l 

l>2""-"-^"=i'^"-^("-^), that is, 0>n^^-n- J]2"->-i). (15) 

i=l 



But a direct check shows that the right-hand size of (15) holds when n = 2^. 

We are left to demonstrate the following inequality: 2"! < 2"^°. 
We proceed by induction for 2 < n < 2^. In this range a computer computation 
shows that 



„20 ^ + 2" < (n + 1)=^°. (16) 

When n = 2, we have 2^! < 22'°. Suppose that 2"! < 2"'° and n < 2^. 
Wc have to prove that 2("+i)! < 2("+i)'°. Since 2"+M = (2" • 2)! = 2"! (2" + 
l)...(2'^ + 2"), we have 

2''!(2"+l) ■ • • (2"+2") < 2"'°+"+^-(2"+2) ■ • • (2"+2") < 2"'°+2"("+^) = 2"'°+2""+2' 

and, applying (16), we get 2'^'°+2""+2" < 2("+i)'°. 
Then the claimed inequality 2**+^! < 2^"+^^ follows. □ 

Our result is contained in the following proposition. 

Proposition 6.10. Let V = (¥2)^' with £ > 2. If G < GL{V), with G 
isomorphic to k\i{{¥2Y'^^), then i > 67. 
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Proof. liG < GL{V), then |G| < \GL{V)\. But |Sym((F2)i28)| = 2^28! > 
2^^^^ thanks to Lemma 6.9 and so 

|G| = |Alt((F2)i^«)| = l^^"'^^^^^^'''^' > ^ = 2^""' > > |GL((F2)^")| . 
Therefore, £ = 66 is not large enough. □ 

Remark 6.11. We could improve the previous bound to £ > 68 by using the 
finite version of the Stirling fomula: 

nlog2n— nlog2(e) < log2(n!) < nlog2n— nlog2(e)+log2n, y—j <n\<n y—j 

6.3.2 Using the order of the elements 

In this subsection we compare the maximum order of elements in the two 
groups Alt((F2)^2^) and GL((F2)2'). We use permutations of even order. We 
denote by o(cr) the order of any permutation a, with cr e Alt((F2)^^®) or 
a e GL((F2)2'). 

The best available result for GL((F2)^ ) is given by the following theorem 

Theorem 6.12 ([DarOS]). Let a e GL((F2)^), with o{a) is even and N >A. 
Then 

o(a) < 2(2^-2 - 1) = 2^-^ - 2. 

Moreover, there is a E GL((F2)^) whose order attains the upper hound. 

Proof. It comes directly from Theorem 1 in [DarOS], with p = q = 2 and 
N > 4 (so point (a) and (b) do not apply). □ 

As regards the order of the elements in Alt((F2)^^®), we would like to use 
the following theorem 

Theorem 6.13 ([DM96]). Let u > 3 and n ^ T. The n Alt((F2)'') contains 
an element rj of order (strictly) greater then eV(V4)ninn_ 

The previous theorem is the special case of Theorem 5.1. A at p. 145 in 
[DM96] when g = 2. 

In order to be able to compare the two estimates coming from Theorem 
6.12 and Theorem 6.13, we rewrite Theorem 6.13 as follows, in order to have 
o{rj) even. Our proof is an easy adaption of the proof contained in [DM96]. 

Theorem 6.14. Let u >7 and n — 2^. Then Alt((F2)^) contains an element 

T] with 0(77) > eV(V4)nlnn ^^^^ ^^^^ 

Proof. Let 2; be a prime number such that 4 + ^3<p<^ P < n , where the sum 
runs over (distinct odd) prime numbers. Then Alt((F2)'^) contains an element 
rjz = aa'a^ ■ ■ ■ Cp ■ ■ ■ such that: a and a' are transpositions, Cp is a cycle 
of length p, and all cycles {a, a', (T3, . . . , cr^} act on disjoint subsets of (F2)'^. 
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In other words, the non-trivial cycles of rj^ are two transpositions and some 
cycles with length 3, . . . , 2;. As a consequence, the order of rj^ is 2 n3<p<zP- 
We are going to show that there is 2; e N such that 

4+ p<n and {i^{z)y > -nlR{n) 

2<p<z 

where '^{z) = ln(o(?7^)) = ln(2) + ^2<p<z ^^(^')' following we consider 

r(2;)=^?(^)-ln(2) = E2<p<.Mp)- 

Since n > 2'^, we note that 4 + J22<p<i9P = 79 < 128 < n. 

Let f{z) — Since f{z) is an increasing function for real 2; > e, in case 
z is real and z > 19, we have that 

/(4) ln(4) + /(3) ln(3) = 7 < /(19) ln(3) < /{z) ln(3) (17) 

and so we can write (if 2; > 19 and 2; e M) 

4+ E P = /(4)ln(4)+ /(p)Mp) 

2<p<z 2<p<z 

= /(4) ln(4) + /(3) ln(3) + J] /(p) ln(p) 

3<p<2 

< fiz) ln(3) + Yl fi^) Hp) 

3<p<z 

= J2 f{z)Hp)^f{z) J2 ln(p) = /(^)r(z). 

2<p<z 2<p<z 

We shall choose z > 19 such that f{z)'i9*{z) — n. Such a z exists because 
/(19)'(9*(19) < 100 < n and f{z)'d*{z) is an increasing function assuming all 
values. 

Since ■&*{z) > z/2 for all z > 19, we have 

^ ^ 2(r(.-))^ ^ 4(r(.-))^ ^ f (4(r(.-))^) 
In(^) ln{2i)*{z)) 21n(27?*(^)) ^ v^^^ ^ " 

However we also have /(nln(n)) < n. Since / is an increasing function, this 
shows that nln(n) < A{iT'{z))^ < 4('i9(^))^. It is now enough to consider z as 
the largest prime smaller than z. □ 

Now, we compare the estimate from Theorem 6.12 and Theorem 6.13. 
Take n = 2^^^ and 77 G Alt((F2)^^^) such that o{rj) > e* (0(7^) even), where 
t = ^(l/4)nlnn = ^(l/4)2i28 ln(2i28). 

Since 

t _ a/2126 128 In 2 _ V2133 In 2 _ / A/21n2\266 

by replacing e with 2^°^"^^, we obtain 
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where £ e R is circa 1.69. According to Theorem 6.14, the order of rj is at 
least o{rj) > e^''^ If Alt((F2)i28) ^ GL((F2)^), we then need the the smallest 
N such that 0(77) < (2-^~^ — 2) (Theorem 6.12). In other words we have to see 
when the following inequality holds 



• if AT = 2^6, then (18) is false, since 2^'"' > 2^ > 2'' - 2; 

• if AT = 2^7, then (18) is true, since 2''''' < 2^''^^-^^ < 2^''-^ - 2. 
Therefore, we need at least ^ > 67 in order to embed Alt((F2)^^^) C 



GL{V), which is exactly the same value as in Proposition 6.10. 
Remark 6.15. It is shown in Landau [Lan03] that the maximum order of an 
element in Sym((F2)'^) is asymptotic to e^"^"" as n — ?> cxd (with n = 2^). 
Assuming this, we observe that we could slightly improve the value of £ we 
need to £ > 68, which is the same as Remark 6.11. 
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